深度学习(DL)技术已被广泛用于医学图像分类。大多数基于DL的分类网络通常是层次结构化的,并通过最小化网络末尾测量的单个损耗函数而进行了优化。但是,这种单一的损失设计可能会导致优化一个特定的感兴趣价值,但无法利用中间层的信息特征,这些特征可能会受益于分类性能并降低过度拟合的风险。最近,辅助卷积神经网络(AUXCNNS)已在传统分类网络之上采用,以促进中间层的培训,以提高分类性能和鲁棒性。在这项研究中,我们提出了一个基于对抗性学习的AUXCNN,以支持对医学图像分类的深神经网络的培训。我们的AUXCNN分类框架采用了两项主要创新。首先,所提出的AUXCNN体系结构包括图像发生器和图像鉴别器,用于为医学图像分类提取更多信息图像特征,这是由生成对抗网络(GAN)的概念及其在近似目标数据分布方面令人印象深刻的能力的动机。其次,混合损失函数旨在通过合并分类网络和AUXCNN的不同目标来指导模型训练,以减少过度拟合。全面的实验研究表明,提出的模型的分类表现出色。研究了与网络相关因素对分类性能的影响。
translated by 谷歌翻译
分子表示学习有助于多个下游任务,例如分子性质预测和药物设计。为了适当地代表分子,图形对比学习是一个有前途的范式,因为它利用自我监督信号并没有人类注释要求。但是,先前的作品未能将基本域名知识纳入图表语义,因此忽略了具有共同属性的原子之间的相关性,但不通过键连接连接。为了解决这些问题,我们构建化学元素知识图(KG),总结元素之间的微观关联,并提出了一种用于分子代表学习的新颖知识增强的对比学习(KCL)框架。 KCL框架由三个模块组成。第一个模块,知识引导的图形增强,基于化学元素kg增强原始分子图。第二模块,知识意识的图形表示,利用用于原始分子图的公共曲线图编码器和通过神经网络(KMPNN)的知识感知消息来提取分子表示来编码增强分子图中的复杂信息。最终模块是一种对比目标,在那里我们在分子图的这两个视图之间最大化协议。广泛的实验表明,KCL获得了八个分子数据集上的最先进基线的优异性能。可视化实验适当地解释了在增强分子图中从原子和属性中了解的KCL。我们的代码和数据可用于补充材料。
translated by 谷歌翻译
在集成感测和通信(ISAC)系统中表征传感和通信性能权衡,在基于学习的人类运动识别的应用中具有挑战性。这是因为大型实验数据集和深神经网络的黑盒性质。本文介绍了SDP3,这是一种模拟驱动的性能预测指标和优化器,由SDP3数据模拟器,SDP3性能预测器和SDP3性能优化器组成。具体而言,SDP3数据模拟器在虚拟环境中生成生动的无线传感数据集,SDP3性能预测器预测基于函数回归方法的传感性能,而SDP3性能优化器会在分析上研究传感和通信性能。结果表明,模拟传感数据集在运动识别精度中非常匹配实验数据集。通过利用SDP3,发现可实现的识别准确性和通信吞吐量由通信饱和区组成,感应饱和区和通讯感应的对抗区域,ISAC系统的所需平衡性能位于第三个一。
translated by 谷歌翻译
The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Due to the highly parallelized implementation and down-sampling strategy, our model, without depth supervision, achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code will be made publicly available.
translated by 谷歌翻译
Video super-resolution (VSR) aiming to reconstruct a high-resolution (HR) video from its low-resolution (LR) counterpart has made tremendous progress in recent years. However, it remains challenging to deploy existing VSR methods to real-world data with complex degradations. On the one hand, there are few well-aligned real-world VSR datasets, especially with large super-resolution scale factors, which limits the development of real-world VSR tasks. On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results. As an attempt to address the aforementioned issues, we build a real-world 4 VSR dataset, namely MVSR4$\times$, where low- and high-resolution videos are captured with different focal length lenses of a smartphone, respectively. Moreover, we propose an effective alignment method for real-world VSR, namely EAVSR. EAVSR takes the proposed multi-layer adaptive spatial transform network (MultiAdaSTN) to refine the offsets provided by the pre-trained optical flow estimation network. Experimental results on RealVSR and MVSR4$\times$ datasets show the effectiveness and practicality of our method, and we achieve state-of-the-art performance in real-world VSR task. The dataset and code will be publicly available.
translated by 谷歌翻译
Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we systematically analyze modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, improving the quality of these nodes' representations. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.
translated by 谷歌翻译
In this paper, we explore the feasibility of utilizing a mmWave radar sensor installed on a UAV to reconstruct the 3D shapes of multiple objects in a space. The UAV hovers at various locations in the space, and its onboard radar senor collects raw radar data via scanning the space with Synthetic Aperture Radar (SAR) operation. The radar data is sent to a deep neural network model, which outputs the point cloud reconstruction of the multiple objects in the space. We evaluate two different models. Model 1 is our recently proposed 3DRIMR/R2P model, and Model 2 is formed by adding a segmentation stage in the processing pipeline of Model 1. Our experiments have demonstrated that both models are promising in solving the multiple object reconstruction problem. We also show that Model 2, despite producing denser and smoother point clouds, can lead to higher reconstruction loss or even loss of objects. In addition, we find that both models are robust to the highly noisy radar data obtained by unstable SAR operation due to the instability or vibration of a small UAV hovering at its intended scanning point. Our exploratory study has shown a promising direction of applying mmWave radar sensing in 3D object reconstruction.
translated by 谷歌翻译
Optimal Transport(OT)提供了一个多功能框架,以几何有意义的方式比较复杂的数据分布。计算Wasserstein距离和概率措施之间的大地测量方法的传统方法需要网络依赖性域离散化,并且受差异性的诅咒。我们提出了Geonet,这是一个网状不变的深神经操作员网络,该网络从输入对的初始和终端分布对到Wasserstein Geodesic连接两个端点分布的非线性映射。在离线训练阶段,Geonet了解了以耦合PDE系统为特征的原始和双空间中OT问题动态提出的鞍点最佳条件。随后的推理阶段是瞬时的,可以在在线学习环境中进行实时预测。我们证明,Geonet在模拟示例和CIFAR-10数据集上达到了与标准OT求解器的可比测试精度,其推断阶段计算成本大大降低了。
translated by 谷歌翻译
远程患者监测(RPM)系统的最新进展可以识别各种人类活动,以测量生命体征,包括浅表血管的细微运动。通过解决已知的局限性和挑战(例如预测和分类生命体征和身体运动),将人工智能(AI)应用于该领域的医疗保健领域越来越兴趣,这些局限性和挑战被认为是至关重要的任务。联合学习是一种相对较新的AI技术,旨在通过分散传统的机器学习建模来增强数据隐私。但是,传统的联合学习需要在本地客户和全球服务器上培训相同的建筑模型。由于缺乏本地模型异质性,这限制了全球模型体系结构。为了克服这一点,在本研究中提出了一个新颖的联邦学习体系结构Fedstack,该体系支持结合异构建筑客户端模型。这项工作提供了一个受保护的隐私系统,用于以分散的方法住院的住院患者,并确定最佳传感器位置。提出的体系结构被应用于从10个不同主题的移动健康传感器基准数据集中,以对12个常规活动进行分类。对单个主题数据培训了三个AI模型ANN,CNN和BISTM。联合学习体系结构应用于这些模型,以建立能够表演状态表演的本地和全球模型。本地CNN模型在每个主题数据上都优于ANN和BI-LSTM模型。与同质堆叠相比,我们提出的工作表明,当地模型的异质堆叠表现出更好的性能。这项工作为建立增强的RPM系统奠定了基础,该系统纳入了客户隐私,以帮助对急性心理健康设施中患者进行临床观察,并最终有助于防止意外死亡。
translated by 谷歌翻译
从单眼RGB图像中重建3D手网络,由于其在AR/VR领域的巨大潜在应用,引起了人们的注意力越来越多。大多数最先进的方法试图以匿名方式解决此任务。具体而言,即使在连续录制会话中用户没有变化的实际应用程序中实际上可用,因此忽略了该主题的身份。在本文中,我们提出了一个身份感知的手网格估计模型,该模型可以结合由受试者的内在形状参数表示的身份信息。我们通过将提出的身份感知模型与匿名对待主题的基线进行比较来证明身份信息的重要性。此外,为了处理未见测试对象的用例,我们提出了一条新型的个性化管道来校准固有的形状参数,仅使用该受试者的少数未标记的RGB图像。在两个大型公共数据集上进行的实验验证了我们提出的方法的最先进性能。
translated by 谷歌翻译